-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: capture errors that shouldnt cause the whole plan to be discarded during network resolution #444
base: transition-to-runkit
Are you sure you want to change the base?
Conversation
d0b83eb
to
63070d9
Compare
72722c1
to
d070e92
Compare
0121c5a
to
a55ca32
Compare
a55ca32
to
b66c087
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that is not clear to me, when trying to deploy the below composition there is an error with second deployment and on_error: commit
. The deployer will still deploy first and second child?
class Parent < Compositions
add Model1, as: "first"
add Model2, as "second"
add Model3, as "third"
...
end
If a child of a parent fails the deployment validation, itself and all of its parents also fail, therefore the other children will also fail. PS: |
571f5dd
to
0eafc30
Compare
This adapts the exceptions that previously refered with multiple tasks to the error capture system, that requires each exception to refer to a single task. The error messages were adapted for a single task context, but unfortunately they can still a bit too verbose.
feature flag This creates the structures that should be used to capture errors. One can retain the previous behavior by using the RaiseErrorHandler, or use the error capture via the ResolutionErrorHandler. The feature flag is meant to be used to switch between the two options.
0eafc30
to
25b6040
Compare
… them Through the resolution error handler, capture the errors during the network generation. Then, they are propagated as failed events for each individual task that caused the error. The previous behavior is enabled when the capture_errors_during_network_resolution feature flag is off. That way, any exception is immediately raised. Some behaviors were slightly changed to work for both the old and new features, specially for unit tests.
25b6040
to
0db91ad
Compare
0db91ad
to
bc251de
Compare
errors Whenever resolution errors are reported, it will emit failed for each individual task AND raise PartialNetworkResolution since the profile assertions require an exception to fail
The error capture introduces a slight change in the engine interface, so we need to adapt these so they keep working as before.
729b796
to
aa31a18
Compare
Now that we can capture errors, we have to add more information on the result of the application of the generated network to the plan. This is done by changing the return of #syskit_apply_resolution_results to a struct that has information of the errors and the planned instances.
aa31a18
to
3b564b1
Compare
core early_deploy: true | ||
core capture_errors_during_network_resolution: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should run with both activated as well ... no ?
validate_deployed_network: (true if Syskit.conf.early_deploy?), | ||
early_deploy: Syskit.conf.early_deploy? | ||
validate_deployed_network: Syskit.conf.early_deploy?, | ||
early_deploy: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change from Syskit.conf.early_deploy?
, especially given that you kept it for validate_deployed_network
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to make early_deploy the default for validate_deployed_network
early_deploy: Syskit.conf.early_deploy?,
validate_deployed_network: early_deploy
|
||
if on_error == :save | ||
errors.each do |e| | ||
handle_resolution_exception(e, on_error: on_error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really does not make sense to call this more than once.
Also, the (very little) difference in logic between handle_resolution_errors and handle_resolution_exception looks suspicious.
) | ||
|
||
unless resolution_errors.empty? || cleanup_resolution_errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find that unless(complex logic)
is very hard to read ... I'd keep it for simple logic (only one boolean)
Here, if !resolution_errors.empty? && !cleanup_resolution_errors
elsif on_error == :commit | ||
work_plan.commit_transaction | ||
else | ||
errors.each { |err| real_plan.execution_engine.add_error(err) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very happy that you modify real_plan
from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion: should be removed.
resolution_errors.each do |error| | ||
t = error.planned_task | ||
expect_execution do | ||
t.failed_event.emit(error.original_exceptions) | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to invert, that is run the loop inside the expect_execution
And replace expect_execution by execute. expect_execution without a to
block actually does nothing
@@ -227,31 +238,32 @@ def syskit_deploy_normalized_placeholder_tasks( | |||
.to { emit(*not_running.map(&:start_event)) } | |||
|
|||
resolve_options = Hash[on_error: :commit].merge(resolve_options) | |||
begin | |||
resolution_errors = execute do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the need for execute
here ?
# formatted = formatted.gsub(/<id:\d+>/, "<id:X>") | ||
# .gsub(/Class:0x[0-9a-f]+/, | ||
# "Class:0xXXXXXX") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover
# assert_equal expected, | ||
# formatted.gsub(/<id:\d+>/, "<id:X>") | ||
# .gsub(/Class:0x[0-9a-f]+>:0x[0-9a-f]+>/, | ||
# "Class:0xXXXXXX>:0xXXXXXX>") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover
t.success_event.emit | ||
end | ||
nil | ||
resolution_apply_result.errors.group_by(&:planning_task).each do |task, e| | ||
task.failed_event.emit e.flat_map(&:original_exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task
is not guaranteed to be running here. Might be from a previous deploy.
ensure | ||
syskit_current_resolution_keepalive.discard_transaction | ||
@syskit_current_resolution = nil | ||
end | ||
|
||
running_requirement_tasks.each do |t| | ||
resolution_apply_result.instances.each_key do |t| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of finishing InstanceRequirementsTask when the resolution finishes, my proposition would be to
- have an intermediate event
resolution_success
, which gets emitted here instead ofsuccess
- change the logic that finds what needs to be deployed to only pick the running InstanceRequirementsTask
- change the logic that finds whether there are new requirements to check if there are running InstanceRequirementsTask without the resolution_success event emitted (resolution_success_event.emitted?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic in runtime/apply_deployment_tasks_from_plan.rb
rescue ::Exception => e # rubocop:disable Lint/RescueException | ||
running_requirement_tasks.each do |t| | ||
t.failed_event.emit(e) | ||
end | ||
e | ||
NetworkGeneration::SystemNetworkPlanApplyResult.new( | ||
fulfilled: false, errors: [e] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be returning a SystemNetworkPlanApplyResult that has the same instances as before the resolution, and a proper list of errors attached to the new ones.
tasks_to_instanciate.map(&:planning_task), | ||
validate_generated_network: false, | ||
early_deploy: false | ||
) | ||
unless resolution_errors.empty? | ||
resolution_errors.each do |error| | ||
t = error.planned_task |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
planning_task
?
elsif on_error == :commit | ||
work_plan.commit_transaction | ||
else | ||
errors.each { |err| real_plan.execution_engine.add_error(err) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conclusion: should be removed.
Syskit.conf.early_deploy = true | ||
Syskit.conf.capture_errors_during_network_resolution = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decouple the two flags, and run 4 test suites (plain core, with early deploy, with capture errors and with both)
Depends on:
We introduced a ResolutionError to mark errors during the new plan
network resolution shouldnt be raised, which causes the whole transation
to fail. Instead, we capture them and fail the deployment of the
specific tasks that caused them.